منابع مشابه
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Dense video captioning is a newly emerging task that aims at both localizing and describing all events in a video. We identify and tackle two challenges on this task, namely, (1) how to utilize both past and future contexts for accurate event proposal predictions, and (2) how to construct informative input to the decoder for generating natural event descriptions. First, previous works predomina...
متن کاملVideo Captioning with Multi-Faceted Attention
Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval. While existing methods rely on different kinds of visual features and model structures, they do not fully exploit relevant semantic information. We present an extensible approach to jointly leverage several sorts of visual features and sema...
متن کاملMulti-Task Video Captioning with Video and Entailment Generation
Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generati...
متن کاملVideo Fire Detection Algorithm using Multi-Feature Fusion
At present, the moving target detection and flame characteristics extraction almost become the most important parts in majority of video fire detection systems. Through the above two-part study, a new fire features detection method is presented in precise moving target area. That is, using the improved background difference method and flame features (such as the color and uniformity, Wavelet en...
متن کاملReconstruction Network for Video Captioning
In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2019
ISSN: 0975-8887
DOI: 10.5120/ijca2019918660